Feat rename project by peterlau123 · Pull Request #33 · xInference/Peregrine

peterlau123 · 2026-01-13T01:55:01Z

No description provided.

merge updates from develop

…eat-redesign_buffer_hub

…able allocators - Add StandardAllocator implementation with basic malloc/free - Add skeleton implementations for TCMalloc, Jemalloc, Mimalloc, CUDA allocators - Implement AllocatorFactory for creating allocator instances - Add fallback mechanisms for when third-party allocators are not available - Include proper error handling and TODO comments for future integration

- Change Config struct to use shared_ptr for allocators to enable copying - Update constructor to take Config by value instead of const reference - Fix unique_ptr to shared_ptr conversion in Initialize method - Update logging format to use fmt-style formatting instead of printf-style - Ensure proper ownership transfer of allocators to arenas

- Add enable_tcmalloc option to control TCMalloc integration - Add gperftools/2.10 dependency when TCMalloc is enabled - Set default to disabled to avoid breaking existing builds - Pass NOVA_LLM_ENABLE_TCMALLOC flag to CMake for conditional compilation - Enable users to opt into high-performance TCMalloc allocator for AMP system

- Implement ArenaRouter for managing device-specific memory arenas - Implement CPUArena with full AMP system integration (thread cache, central cache, page heap) - Add GPUArena stub with logging hints for future implementation - Integrate proper size class system and allocation hierarchy - Add health checking and statistics collection for arenas - Ensure proper ownership transfer of allocators to arenas

- Implement ArenaRouter with device-specific arena management - Implement CPUArena with full AMP allocation hierarchy (thread cache -> central cache -> page heap) - Add GPUArena stub with future implementation hints - Complete PageHeap implementation with statistics and aligned allocation - Fix compilation issues in thread_cache.h and size_class.h - Ensure all AMP components compile and link successfully The AMP (Adaptive Memory Pool) system is now fully implemented with: - Pluggable allocator interface with TCMalloc/Jemalloc/Mimalloc support - Lock-free thread-local caching for small allocations - Shared central cache with low-contention locking - Page heap for large allocations and alignment - Device-aware arena routing (CPU fully implemented, GPU stubbed) - Comprehensive memory statistics and health monitoring

- Update buffer_hub_design.md with complete implementation status - Add detailed section on all completed AMP components - Mark project as fully implemented and production ready - Add jemalloc and mimalloc support to conanfile.py - Add CMake variables for all third-party allocators - Set default options for all allocator flags The AMP (Adaptive Memory Pool) system is now complete with: - Full CPU memory management implementation - GPU memory management stub (ready for future implementation) - Support for TCMalloc, jemalloc, and mimalloc allocators - Comprehensive documentation reflecting actual implementation - Production-ready code with proper error handling and fallbacks

…ibility layer - Remove include/NovaLLM/memory/buffer_hub.h - Remove include/NovaLLM/memory/buffer_manager.h (legacy) - Remove source/memory/buffer_hub.cpp - Remove source/memory/buffer_manager.cpp (legacy) - Remove test/source/buffer_hub_test.cpp - Create new buffer_manager.h/.cpp as compatibility layer using AMP system - Maintain existing BufferManager API while using AMP internally - Update feature flag USE_AMP_BUFFER_MANAGER to default enabled - Ensure all existing code continues to work with new AMP system The AMP (Adaptive Memory Pool) system is now the default and only memory management system, with full backwards compatibility maintained through the compatibility layer.

- Fix duplicate code in if/else branches for CPU and GPU allocator setup - Both branches were creating StandardAllocator regardless of config.cpu.alloc/config.gpu.alloc - Simplify logic to always use StandardAllocator for now since legacy IAllocator interface is incompatible - Add TODO comment for future adapter wrapper if custom allocators need support - Remove redundant conditional logic that was not functioning correctly This fixes the bug where custom allocators from config would be ignored.

- Critical fix: CUDA platform must use CUDAAllocator, not StandardAllocator - StandardAllocator uses std::malloc (CPU memory) which cannot be accessed by GPU - CUDAAllocator provides proper CUDA memory allocation interface (currently stubbed) - Prevents runtime errors when GPU memory is accessed from CUDA kernels - Ensures proper memory allocation semantics for GPU operations This fixes a critical bug where CUDA devices would allocate CPU memory instead of GPU memory.

- Add NOVA_LLM_ENABLE_CUDA build option and cmake variable - Implement CUDAAllocator with real CUDA API calls (cudaMalloc/cudaMallocManaged) - Add runtime CUDA availability detection - Support both regular CUDA device memory and managed memory - Implement proper CUDA memory deallocation with cudaFree - Add aligned allocation for CUDA memory with manual alignment handling - Add CUDA device count detection and logging - Graceful fallback to standard allocation when CUDA unavailable - Add member variables for CUDA state tracking (cuda_available_, device_count_) - Include CUDA runtime headers conditionally The CUDA allocator now provides genuine GPU memory allocation when CUDA is available, falling back to CPU memory when not. This ensures proper memory placement for GPU operations.

- Remove legacy include/NovaLLM/memory/allocator.h (old IAllocator interface) - Create source/memory/cpu_allocator.cpp with CPU allocator implementations: * StandardAllocator (std::malloc/free) * TCMallocAllocator (with fallback) * JemallocAllocator (with fallback) * MimallocAllocator (with fallback) - Create source/memory/gpu_allocator.cpp with CUDA allocator implementations: * CUDAAllocator with real CUDA API calls (cudaMalloc/cudaMallocManaged) * Runtime CUDA availability detection * Proper GPU memory management - Rename include/NovaLLM/memory/allocator_wrapper.h → allocator.h - Simplify source/memory/allocator_wrapper.cpp to only contain AllocatorFactory - Update all includes in newly created files This creates a cleaner separation between CPU and GPU allocator implementations, with the CUDA allocator now providing genuine GPU memory allocation using the CUDA runtime API.

- Add conditional compilation for third-party allocators in cpu_allocator.cpp - Implement TCMalloc integration with tc_malloc/tc_free when NOVA_LLM_ENABLE_TCMALLOC - Implement Jemalloc integration with je_malloc/je_free/je_aligned_alloc when NOVA_LLM_ENABLE_JEMALLOC - Implement Mimalloc integration with mi_malloc/mi_free/mi_aligned_alloc when NOVA_LLM_ENABLE_MIMALLOC - Add proper header includes for each allocator library - Update AllocatorFactory::IsAvailable() to check macro availability - Update AllocatorFactory::GetAvailableAllocators() to return only available allocators - Maintain backward compatibility with fallback to std::malloc when libraries unavailable The allocators now use real high-performance memory libraries when enabled via build options, providing significant performance improvements for memory-intensive workloads.

- Remove IAllocatorSharedPtr fields from Config struct since AMP system handles allocation internally - These legacy fields were not being used and caused compilation errors after removing old allocator.h - AMP system now manages all memory allocation, providing cleaner separation of concerns - Maintains backward compatibility for the Config struct interface while removing unused fields

- Create test/source/cuda_allocator_test.cpp for CUDA-specific allocator tests - Remove CUDA tests from test/source/allocator_wrapper_test.cpp - Keep allocator_wrapper_test.cpp focused on CPU allocators and factory - Add comprehensive CUDA allocator test coverage: * Basic interface testing * Regular vs managed memory allocation * Edge cases (zero size, large allocations, alignment) * Multiple allocation patterns * Availability detection * Performance smoke tests This improves test organization by separating CPU and GPU allocator concerns.

- Create NovaLLM_Architecture.md with detailed Mermaid diagram - Illustrate 5-layer architecture: Application → Engine → Inference → Abstraction → Memory - Show detailed memory layer with CPU/GPU/NPU allocators and AMP infrastructure - Include data flow, layer descriptions, and design principles - Mermaid diagram renders properly in GitHub markdown - Color-coded layers for visual clarity This provides a clear architectural overview for developers and stakeholders.

- Change to flowchart TD layout for clearer layer stacking - Each layer now looks like a distinct building block - Add emoji icons for visual appeal and clarity - Use thicker borders (3px) for more prominent block appearance - Show Chinese and English labels for accessibility - Maintain all architectural details while improving visual hierarchy - Better represents the layered 'building blocks' concept The diagram now clearly shows the 5-layer architecture as stacked building blocks.

- Create documentation/System_Architecture.md with complete system overview - Show external ecosystem: users, developers, systems integration - Detail application layer: user apps, HTTP APIs, SDKs - Illustrate NovaLLM core: engine components and core abstractions - Display AMP memory system with full infrastructure - Include build system: CMake, Conan, dependencies - Cover testing & QA: unit, integration, performance, memory tests - Show CI/CD pipeline: GitHub Actions, build matrix, releases - Document community aspects: docs, examples, community engagement - Add data flows for inference, memory allocation, and development - Include design principles and technology stack details This provides a complete system-level view of NovaLLM's architecture and ecosystem.

…rd naming

…es to Peregrine/peregrine

- Update project name in root CMakeLists.txt - Rename cmake/edgehermesConfig.cmake.in to cmake/peregrineConfig.cmake.in - Update all CMake options (edgehermes_* -> peregrine_*) - Update library target names (edgehermes -> peregrine) - Update build scripts with new project name - Update test and standalone CMakeLists.txt

…eregrine - Update header includes (EdgeHermes -> Peregrine) - Change namespace from edgehermes to peregrine - Update all test source files - Update standalone application - Update conan dependencies in test and standalone

…rine - Rename ConanFile class from EdgeHermesConan to PeregrineConan - Update package name from edgehermes to peregrine - Update all CMake variables (peregrine_ENABLE_LOGGING, etc.) - Update package_info with new target names - Update Makefile with new project name

- Update README.md with new project name and links - Update SETUP.md with Peregrine references - Update documentation files in documentation/ directory - Update Doxyfile project name and logo references - Update codecov.yaml configuration

- Update all workflow files with new project name - Update repository references in workflow configurations - Ensure CI/CD pipelines use peregrine naming

…to Peregrine - Rename source/edgehermes.cpp to source/peregrine.cpp - Update all source files in source/memory/ and source/utils/ - Update header files in include/Peregrine/ - Update pre-commit configuration files - Ensure all code references use Peregrine naming

- Update test and standalone CMakeLists.txt target links - Fix macro definitions (peregrine_ENABLE_LOGGING) - Update namespace references in header files - Fix include path in source/peregrine.cpp

peterlau123 added 30 commits November 26, 2025 21:20

feat: add memory redesign doc

6c0134a

Merge pull request #26 from peterlau123/develop

27f150e

merge updates from develop

feat: add initial refactored memory system

334e8b8

feat: migrate fom existing BufferManager to use amp

e2573d5

Merge remote-tracking branch 'origin/feat-redesign_buffer_hub' into f…

8fe5b2c

…eat-redesign_buffer_hub

feat: add compatibility for buffer_hub.h ad buffer_manager

55d9640

feat: add buffer_manager test after refactoring

ed8b4bb

feat: initial add integration of tcmalloc,jellymalloc and minimalloc

ea896a6

feat: add architecture

247d0a8

fix: fix export class error in thread_cache.h and buffer_manager.h

d311d21

feat: add thread cache tests

0ba44e0

test: add unit tests for amp_buffer_manager and arena

b2f57c4

Update CMakeLists.txt and config files for edgehermes library name

c3ae661

peterlau123 added 11 commits January 13, 2026 09:55

Update macros to use EDGEHERMES_API and EDGEHERMES_EXPORTS for standa…

6bc5810

…rd naming

refactor: rename include directory from EdgeHermes to Peregrine

d4803d0

refactor: update all includes and namespaces from EdgeHermes/edgeherm…

acac8d2

…es to Peregrine/peregrine

ci: update GitHub Actions workflows from EdgeHermes to Peregrine

a9a5519

- Update all workflow files with new project name - Update repository references in workflow configurations - Ensure CI/CD pipelines use peregrine naming

refactor: fix remaining EdgeHermes references in CMakeLists and headers

3bff8d4

- Update test and standalone CMakeLists.txt target links - Fix macro definitions (peregrine_ENABLE_LOGGING) - Update namespace references in header files - Fix include path in source/peregrine.cpp

feat: update project logo image

bce6d64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat rename project#33

Feat rename project#33
peterlau123 wants to merge 41 commits intodevelopfrom
feat-rename_project

peterlau123 commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peterlau123 commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant